Towards Business Intelligence over Unified Structured and Unstructured Data Using XML
نویسندگان
چکیده
Traditional data warehousing has been very successful in helping business enterprises to make intelligent decisions through declarative analysis of large amount of structured data stored in a relational database. However, not all enterprise data naturally fit into a relational model. Within an enterprise, there are huge amount of unstructured data, such as document content, emails, spreadsheets, that do not have a fixed schema, or have a very sparse or loose schema that cannot be effectively modeled using relational model. Yet, like relational data, unstructured data record many useful facts that are equally essential and important to be analyzed by businesses to make intelligent decisions. In this chapter, we propose an XML-enabled RDBMS that uses XML as the underlying logical data model to uniformly represent both well-structured relational data, semi-structured and unstructured data in building an enterprise data warehouse that is able to store and analyze any data regardless of existence of schema or not. We show how XQuery used in SQL/XML as a declarative language to do data query, analysis and transformation over both structured data and unstructured content in the data warehouse. We present the rationale for using XML as the logical data model for unified data warehouse query, XML extended inverted text index to integrate structured data query and context aware full text search for unstructured content so as to support efficient data analysis over large volume of structured and unstructured data. We argue that the technical approach of using XML to unify both structured and unstructured data in a warehouse has the potential to push business intelligence over all enterprise data to a new era.
منابع مشابه
Text Analytics to Data Warehousing
─ Information hidden or stored in unstructured data can play a critical role in making decisions, understanding and conducting other business functions. Integrating data stored in both structured and unstructured formats can add significant value to an organization. With the extent of development happening in Text Mining and technologies to deal with unstructured and semi structured data like X...
متن کاملSemantic Web Mining of Un-structured Data: Challenges and Opportunities
The management of unstructured data is acknowledged as one of the most critical unsolved problems in data management and business intelligence fields in current times. The major reason for this unresolved problem is primarily because of the actuality that the methods, systems and related tools that have established themselves so successfully converting structured information into business intel...
متن کاملPanel: One Platform for Mining Structured & Unstructured Data: Dream or Reality?
2. INTRODUCTION Although enterprises commonly utilize sophisticated data integration technology and business intelligence tools for analysis of structured data, analysis of unstructured data is a separate process and is often limited to capabilities supported by a search engine. Users have separate and vastly different interfaces for structured and unstructured data: Business Intelligence for s...
متن کاملAnswering Structured Queries on Unstructured Data
There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and query, source discovery and categorization, indexing and some forms of recovery. One of the key services of a DSSP ...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012